Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus

نویسندگان

  • Giulia Donato
  • Patrizia Paggio
چکیده

In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji – an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EmojiNet: An Open Service and API for Emoji Sense Discovery

This paper presents the release of EmojiNet, the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet is a dataset consisting of: (i) 12,904 sense labels over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated ...

متن کامل

Emoji as Emotion Tags for Tweets

In many natural language processing tasks, supervised machine learning approaches have proved most effective, and substantial effort has been made into collecting and annotating corpora for building such models. Emotion detection from text is no exception; however, research in this area is in its relative infancy, and few emotion annotated corpora exist to date. A further issue regarding the de...

متن کامل

Signals Revealing Street Gang Members on Twitter

We study the problem of automatically finding gang member profiles on Twitter. We outline a process to curate one of the largest sets of verifiable gang member profiles that has ever been studied. A review of these profiles establishes differences in the language, images, YouTube links, and emoji features gang members use compared to the rest of the Twitter population. We generate word embeddin...

متن کامل

Emoticons vs. Emojis on Twitter: A Causal Inference Approach

Online writing lacks the non-verbal cues present in face-toface communication, which provide additional contextual information about the utterance, such as the speaker’s intention or affective state. To fill this void, a number of orthographic features, such as emoticons, expressive lengthening, and non-standard punctuation, have become popular in social media services including Twitter and Ins...

متن کامل

Emotion Analysis of Twitter Data That Use Emoticons and Emoji Ideograms

Twitter is an online social networking service on which users worldwide publish their opinions on a variety of topics, discuss current issues, complain, and express many kinds of emotions. Therefore, Twitter is a rich source of data for opinion mining, sentiment and emotion analysis. This paper focuses on this issue by analysing symbols called emotion tokens, including emotion symbols (e.g. emo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017